High Performance Data Analysis via Coordinated Caches

نویسندگان

  • M Fischer
  • T Hauth
چکیده

With the second run period of the LHC, high energy physics collaborations will have to face increasing computing infrastructural needs. Opportunistic resources are expected to absorb many computationally expensive tasks, such as Monte Carlo event simulation. This leaves dedicated HEP infrastructure with an increased load of analysis tasks that in turn will need to process an increased volume of data. In addition to storage capacities, a key factor for future computing infrastructure is therefore input bandwidth available per core. Modern data analysis infrastructure relies on one of two paradigms: data is kept on dedicated storage and accessed via network or distributed over all compute nodes and accessed locally. Dedicated storage allows data volume to grow independently of processing capacities, whereas local access allows processing capacities to scale linearly. However, with the growing data volume and processing requirements, HEP will require both of these features. For enabling adequate user analyses in the future, the KIT CMS group is merging both paradigms: popular data is spread over a local disk layer on compute nodes, while any data is available from an arbitrarily sized background storage. This concept is implemented as a pool of distributed caches, which are loosely coordinated by a central service. A Tier 3 prototype cluster is currently being set up for performant user analyses of both local and remote data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selection Policy: Fighting against Filter Effect in Network of Caches

Many Information Centric Networking (ICN) proposals use a network of caches to bring the contents closer to the consumers, reduce the load on producers and decrease the unnecessary retransmission for ISPs. Nevertheless, the existing cache management scheme for the network of caches obtain poor performance. The main reason for performance degradation in a network of caches is the filter effect o...

متن کامل

Distributed caching with centralized control

The beneets of using caches for reducing traac in backbone trunk links and for improving web access times are well-known. However, there are some known problems with traditional web caching, namely, maintaining freshness of web objects, balancing load among a number of caches and providing protection against cache failure. This paper investigates in detail the advantages and disadvantages of a ...

متن کامل

C3D: Mitigating the NUMA bottleneck via coherent DRAM caches

Massive datasets prevalent in scale-out, enterprise, and high-performance computing are driving a trend toward ever-larger memory capacities per node. To satisfy the memory demands and maximize performance per unit cost, today’s commodity HPC and server nodes tend to feature multi-socket shared memory NUMA organizations. An important problem in these designs is the high latency of accessing mem...

متن کامل

Coordinated Placement and Replacement for Large-Scale Distributed Caches

In a large-scale information system such as a digital library or the web, a set of distributed caches can improve their eeectiveness by coordinating their data placement decisions. Using simulation, we examine three practical cooperative placement algorithms including one that is provably close to optimal, and we compare these algorithms to the optimal placement algorithm and several cooperativ...

متن کامل

Memory Performance Profiling via Sampled Performance Monitor Event Traces

Memory performance can be studied, process behavior can be characterized, and application performance can be improved through the use of sampled performance monitor event traces. As an example, this paper demonstrates how sampled traces of the TPC-C benchmark executed on eightand 32-processor configurations of the IBM eServer pSeries 690 (p690) are analyzed to identify the resolution sites of l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015